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ABSTRACT 

On many campuses, student ratings o£ courses and 
teachers are routinely given to instructors as feedback about their 
teaching. This review applies meta-analytic techniques to 30 studies 
of the effectiveness of student ratings feedback. The studies met the 
following criteria: (1) they used student ratings as the primary 
source of feedback; (2) they investigated post-secondary instructi 
(3) they were conducted in the classroom rather than laboratory 
settings; (4) they employed a control group for comparison purposes; 
and (5) they stood apart from larger training programs in which the 
effects of feedback are inseparable from the effects of training. 
Results indicated that feedback from utudent ratings alone producer, a 
positive but small effect on subsequent ratings. When ratings were 
accompanied by consultation and/or other types of feedback, 
considerably larger positive effects were likely. In the few studies 
of feedback's effects on student achievement and affect, results were 
less clear. Future research should: (1) investigate additional 
dependent variables; (2) more carefully doctunent and investigate 
feedback implementations; and (3) explore additional characteristics 
of the recipients v^f feedback. The studies analyzed in this review 
are attatched, as well as a listing of those studies which did not 
meet criteria for meta-analysis (JD) 
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Effects of Student Evaluation Feedback: 
A Meta-Analysis of Higher Education Research 



The institutionalization of course and teacher evaluations on many 
campuses in recent years has stimulated controlled research on the effects 
of feedback to teachers. In this paper, we examine how feeding back 
information about teaching affects the subsequent teaching of instructors in 
postsecondary education. Such information may come from many sources, 
but this review Is limited to studies in which Information for feedback 
comes from student evaluations. We report a meta-analysis of 30 studies, 
most of which take the following form: Results of student evaluations, 
usually collected at midterm, are fed back to some of the participating 
teachers, and end-of-term student evaluations are examined to identify 
differences between teachers who did and did not receive feedback. 

Method 
Searching the Literature 

Computer-assisted searches were conducted on the ERIC (Education 
Resources Information Center), DAI (Dissertation Abstracts International), 
Psyclnfo (Psychological Abstracts), and MEDLINE (Index Medicus) databases. 
The Business Publications Index and Abstracts and the Business Periodicals 
Index were searched manually. Bibliographies of the Items identified In 
these searches were scanned in order to locate additional pertinent 
references. 

Over 300 books. Journal articles, dissertations, and unpublished reports 
and papers were reviewed for consideration and examination. Seventy-one 
empirical studies were identified which evaluated the effectiveness of some 
form of feedback to postsecondary instructors for the purpose of improving 
their teaching (Menges, Brinko, & L'Hommedleu, 1986). In 52 of the 71 
studies, feedback was in the form of student ratings. 



CMteria for Inclusion in the Meta-Analysis 
The 30 Studies finally included in the meta-analysis are studies 1 ) which 
Investigated post-secondar/ instruction, 2) which used student ratings as 
the primary source of feedback, 3) which were conducted in classroom rather 
than laboratory settings, 4) which employed a control group for comparison 
purposes, and 5) which stood apart from larger training programs in which 
the effects of feedback are Inseparable from the effects of training. These 
studies are listed in Attachment A. Studies of student ratings feedback 
which do not meet these criteria are listed In Attachment B. 

Procedures for Coding and Analyzing Data 
Information about these studies was coded into 46 categories which 
describe each study along five dimensions: document characteristics, 
participant characteristics, treatment characteristics, design and analysis 
characteristics, and variable characteristics. 

When possible, effect sizes were calculated using Glass's (Glass, McGaw, 
& Smith, 1981, p. 102) formula (nine studies): 

A-(Xe- X^)/s,. 

For studies where that formula could not be used, i. e., when F or £. 
statistics rather than means and standard deviations were reported, this 
formula (Glass, McGaw, Smith, 1981, p. 107) was used (nine studies): 

Where it was not possible to use either formula, e. g., when only chi 
squares were reported, we made conservative estimates of effect size based 
on level of significance and sample size (one study). In instances where an 
effect size could neither be calculated nor estimated but where reported 
results clearly indicated only small and random differences between groups, 
an effect size of zero was assigned (nine studies). Where effect sizes could 
neither be calculated nor estimated but where reported results indicated a 
positive significant difference between groups (two studies), we substituted 
an effect size equal to the mean of the effect sizes of those studies which 



reported positive significant differences that were calculable or estimable. 

Many of the studies contained multiple comparisons; for example, several 
studies compared treatment groups on each item of a questionnaire, in these 
cases we chose to average effect sizes of comparisons within studies 
because averaging yielded a more straightforward and interpretable result. 
Averaging also avoids problems of unfairly weighting the results of multiple 
comparison studies and of capitalizing on sampling error. The effects of 
theoretically relevant factors and subcomparisons are discussed elsewhere 
(L'Hommedieu, Brinko, & Menges, 1986). 

Results 

Of the 30 studies in the meta-analysis, 10 found significant differences 
between groups, and all comparisons favored the feedback group. One study 
found mixed results. The remaining 19 studies fo»jnd no significant 
differences between groups. Our findings are summarized in Table I. 

Effects on Subsequent Student Ratings 
All implementations versus no feedback 

Twenty-seven studies with 31 comparisons compared instructors who 
received some form of feedback from student ratings with instructors who 
received no feedback. The average effect size was .44 with a standard 
deviation of .64 (a < .001 ). Thus, on the average, student ratings feedback 
raised subsequent ratings by almost one-half of a standard deviation. This 
indicates that at the end of the experimental treatment, ratings of the 
average teacher in the experimental groups were higher than 67 percent of 
the teachers in the control groups. 

Summarizing across all studies, however, obscures important 
distinctions. These 27 studies include three distinct implementations of 
feedback: 1) student ratings feedback, e. g., statistical summaries 
sometimes accompanied by interpretative texts and/ or written suggestions 
for improvement; 2) stMdent ratings feedback with consultation, i. e.. student 
ratings feedback discussed with a consultant for one to two hours, 



sometimes accompanied by verbal suggestions for improvement; and 3) 
augmented student ratings feedback, i. e, student ratings feedback with 
consultation accompanied by feedback from other sources, such as 
self-evaluation, peer evaluation, peer group discussions, videotape analysis, 
and so on. Below we report effect sizes separately for these three types of 
student ratings feedback implementations. 

Student ratlnps alone versu s^no feedback. (23 studies reporting 23 
comparisons.) Sixteen of these studies, or 70 percent, found no significant 
differences between feedback and no feedback groups. Student ratings 
feedback alone produced a very small effect (tl^ = .22; SQ = .32; Q < .0 1 ). 

Feedback of this type raised subsequent ratings by one-fifth of a standard 
deviation unit; in other words, after receiving student ratings feedback, the 
average instructor was rated higher than 59 percent of control group 
teachers. Thus, It appears that systematic feedback from student ratings 
alone has a posklve but small effect. 

Student ratings with consultation versu s no feedback. (Five studies 
reporting five comparisons.) Student ratings feedback with consultation 
produced greatly varied results. In these studies, effect sizes ranged from 0 
to 2.50 (U^ - 1 . 1 0; SI2 ' 1 . 1 4; n.s.). However, four of the five studies reported 
significant differences, favoring the group which received student ratings 
feedback with consultation. On the average, feedback of this type raised 
subsequent ratings by more than one standard deviation unit. Thus, the 
average instructor who received student ratings feedback with consultation 
was subsequently rated higher than 86 percent of teachers in the control 
groups. We conclude that systematic student ratings feedback accompanied 
by interaction with a consultant £2D have large positive effects on 
subsequent performance; however, variables critical to effective 
consultation are yet to be identified. 

Augmented student ratin gs versus no feedback (Three studies reporting 



three comparisons.) Student ratings feeclbacl< with consultation augmented 
by feedbacic from other sources produced a mean effect size of .996 (SH - 
55; as.). Although there are only three such studies, each showed positive 
results and two reached statistical significance. Augmented feedback raised 
Instructor performance by one standard deviation, and ratings of these 
Instructors exceeded ,ne ratings of 84 percent of control group instructors. 
When student ratings feedback Is accompanied by other types and sources of 
feedback, the process can result In large positive effects on instructor 
performance. 

Augmented student ratings versus student ratings alone 

Two studies reporting three comparisons (not shown in Table 1 ) 
investigated the effects of augmentation on subsequent student ratings. 
Both studies reported no significant differences between those receiving 
student ratings feedback alone and those receiving student ratings feedback 
with consultation augmented with other types of feedback. Because we were 
unable to calculate or estimate actual effect sizes In both studies, all three 
comparisons were assigned effect sizes of 0. It is interesting to note that 
videotape recording was the other source of feedback in each comparison. It 
may be that videotape feedback and student ratings feedback are interactive 
and the effects of one adds nothing to the effects of the other. 

Effects on Studen t Achievement 
All implementations versu s no feedback 

Three studies reporting four comparisons compared the effects of student 
ratings feedback on student achievement. These studies yielded greatly 
varied results (d^ » .25, SQ • 6 1 ; n.s.). This diversity could be due to the 

nature of the instruments used (one was standardized and two were locally 
constructed); however, comparison of results as measured by standardized 
and nonstandard I zed instruments yields no conclusive trends. It is also 
possible that the variation in results is due to differences in teacher 



expectations, e. g., teaching to the test, or to the implementation of the 
feedback process. Clearly, more research is needed to determine the effect 
of student ratings feedback on student achievement. Nevertheless, we report 
comparisons separately for each type of feedback Implemented. 

Student ratings alone vers us no feedback. (Two studies reporting two 
comparisons.) The results of the two studies which investigated the effects 
of student ratings feedback alone on student achievement found greatly 
discrepant results. One study mean was -.53 and the other was .94 Thus the 
effect size found (Cl^ = .20, SDl = 10; n.s.) is inconclusive. The study which 
reported the mean of -.53 used a standardized measure of achievement, and 
the other study which reported the mean of .94 used a nonstandardized 
measure. 

Student ratings with consul tation versus no feedback. (Two studies 
reporting two comparisons.) Less discrepancy was found in the two studies 
which compared the achievement of students of instructors who received 
student ratings feedback with consultation and the achievement of students 
of Instructors who received no feedback (d^ » .30, 5Q » . 1 5; n.s ). One of the 
studies used a standardized measure and the other used a nonstandardized 
measure. Although the statistic did not reach significance, differences in 
both studies favored the feedback groups. 
Augmented student ratings versus studen t ratings alone 

The only study which compared augmented student ratings feedback with 
student ratings feedback alone (not shown in Table 1 ) reported no significant 
differences between groups on student achievement. The effect size 
calculated for this comparison is .18, a very small but positive effect. 

Effects on .student Affprt 
All Implementations versus no feedback 

Five comparisons from three different studies were located which 
compared the effects of some form of sLudcnl ralingsi feedback with a no 
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feedback control group. Two of the three studies used locally constructed 
instruments which measured attitude toward the subject and attitude 
toward self. Both of these studies reported significant differences between 
groups. The third study used a measure similar in content but standardized. 
This study reported no significant differences between groups. The mean 
effect size was .40 (5Q ■ .25; a < .05). Thus the average class whose 
instructor received some type of student ratings feedbacl< scored higher on 
measures of affect than 65 percent of the no feedbacl< control group. 

However, when effects are analyzed for the three types of student ratings 
feedback, each fails to reach significance. This is not surprizing given the 
small number of comparisons for each type of student ratings feedback. More 
research is needed in this area in order to determine the effects of student 
ratings feedback on student affect. 

Comparison with Earlier Reviews 

Three previous reviews on this topic have appeared. In their qualitative 
review, Rotem and Glasman (1979) examined 13 empirical studies, six of 
which investigated feedback to postsecondary instructors. They concluded 
that "feedback from student ratings (as was elicited and presented to 
teachers in the studies reviewed) does not seem to be effective for the 
purpose of improving performance of university teachers" and suggested that 
"educational consulting services may be required as an integral part of 
evaluation aimed at improving teaching" (p. 507). 

The present review expands and updates Cohen's (1980) meta-analysis. 
Cohen analyzed 17 studies, all but one of which are also included in the 
present review. He found a mean effect size of .38 across all studies, a mean 
effect size of .20 for studies of student ratings alone, and a mean effect size 
of .64 for studies of student ratings feedback with consultation and/ or other 
types of feedback. Cohen concluded that "comparatively large effect sizes 
emerged from studies using augmented feedback or consultation in 
conjunction with student ratings feedback. Studies using only student rating 



feedback produced much smaller effects. These results clearly suggest that 
instructors need more than Just student ratings feedback to markedly 
improve their instruction" (p. 338). 

In a discursive review of the literature on improving postsecondary 
instruction, Levinson-Rose and Menges ( 1 98 1 ) noted 24 studies investigating 
the effects of student ratings feedback. They concluded that "feedback from 
students can positively affect subsequent teaching, particularly if ratings 
are accompanied by consultation" (p.419). They also rated each study 
according to its design features as warranting low, moderate, or high 
confidence and observed that "the greater our confidence in the study, the 
less likely it supports the intervention" (p.417). 

Compared with these previous reviews, the larger pool of studies now 
available permits differentiation of three rather than two treatment 
implementations. Unfortunately the number of studies was quite uneven 
across Implementations. Nevertheless, effects from student ratings 
feedback alone are modest. When accompanied by face-to-fare consultation, 
however, the effects are more than quadrupled. Aiid when accompanied by 
face-to-face consultation plus other types of feedback, the effects are 
similarly strong. 

These results are counterintuitive in that we expected student ratings, 
consultation, and feedback from other sources to have additive effects. 
Instead, student ratings feedback with consultation ana augmented student 
ratings feedback yield approximately the same effect. We offer three 
explanations for this finding. First, it may be that student ratings feedback 
and feedback from other sources are interactive, that is, the effects of one 
may cancel out the effects of the other. Second, the true gains made by 
instructors in their efforts to Improve their instruction may be masked by a 
ceiling effect. Since most student ratings instruments utilize a 5-point 
Likert scale, there is little room to report a wide range of improvement for 
the already average or above average instructor. Third, the similarity In 



results may be due to differences in types of studies. Wherea; dll studies of 
student \ atings feedback with consultation were empirical studies or the 
feedback process per se, all studies of augmented feedback were evaluation 
studies of faculty development programs. This difference in focus may have 
produced different expectations in experimenters, in instructors, or in the 
students, which in turn produced differential results. 

Since the present review is the most recent and the largest quantitative 
synthesis, it enables us to ask whether more recent studies differ from 
earlier studies in their findings and in their design. We divided the studies 
into two groups: those reported prior to 1980, the year Cohen's meta- 
analysis appeared, and those reported during or since 1980. With regard to 
results, we found a marked difference: The mean effect size of studies 
reported since 1980 (seven studies reporting nine comparisons; = .83) is 
three times as large as the effact size of the pre- 1 980 group (20 studies 
reporting 22 comparisons; 0^ - .28). However, the variance of the later 
group (SQ » .9 1 ) is twice as great as the pre- 1 980 group =.41 ). 

With regard to design, studies appearing In 1980 or later are no more 
likely than earlier studies to employ feedback from sources other than 
student ratings (what we have termed augmented feedback) or to define 
impact beyond student ratings, e. g., using measures of student affect or 
achievement. 

What Major Questions Remain? 

We have found that across studies, the effects of feedback from student 
evaluations are evident in subsequent student ratings, especially when 
feedback from students is augmented with consultation or with consultation 
plus feedback from other sources. These findings are quite clear, and we 
contend that additional studies like those of recent years would be 
redundant. Instead, studies should be refined to deal with three problem 
areas apparent In this literature'. 
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1. The first area concerns dependent variables. The dependent variables 
of student achievement and student affect should have additional research 
attention, given the inconclusive findings so far. Other indicators of the 
impact of feedback should also be studied, including comments by students 
on open-ended evaluations or in interviews, changes in faculty and student 
attitudes and values, modifications of course materials, and so on. 

A more fundamental problem stems from the nature of student evaluation 
Instruments. Most are based on unarticulated theories of teaching 
effectiveness, a problem which Abrami (1985) among others has called to 
our attention. While such cheory is being developed, researchers can take the 
interim step of closely matching dependent measures to the content of 
feedback. For example, if feedback includes information on classroom 
interactions, then student evaluation items should elicit data on that topic. 
Few Investigators appear to have designed dependent measures in this way, 
thus reducing their sensitivity. 

Finally, impact of feedback over time with repeated interventions should 
be investigated more carefully. 

2. The second problem area is the nature of feedback implementations. 
In most of these reports, feedback as a treatment Is described only 
sketchlly. Few details are given about how information is communicated. 
Investigators seldom verify even that feedback has been received, leading us 
to wonder how much of the effects of consultation occur merely because 
consultation insures that feedback is actively attended to and processed. 

What goes on during consultation is also inadequately reported. We 
wonder which content is emphasized and how much emphasis is given to 
interpretation, to diagnosis, and to suggested correctives. We also wonder 
whether it makes a difference if faculty members themselves control 
decisions, i. e., if the ifiactific decides whether or not to receive feedback, 
what the content of that feedback should be, and what assistance is needed 
to interpret information and to plan for change. 

" 12 



Finally, other treatments should be investigated. There might be 
assessments of such sources of f eedbacl< as videotapes, peer observers, 
Interaction znalysis, and so on, relative to student ratings, ''th and without 
consultation. Such treatments might a'so be tried in combination, given 
adequate theoretv justification and appropriately sensitive measurer 

3. Characterltcics of the recipient of feedback comprise the th* d 
probler area. Some faculty are surely more ready to use feedback :han 
others, -everal studies suggest that one indicator of readiness 
discrepancy between self-evaluaton and student evaiution, at U ^t\^ nat 
discrepancy is moderate. Other variables oeserving further st' ■ '.^lude 
objective characteristics such as sex, years of experier-^" *nure status 
(as a possible Indicator of professional yulnerabllity). 

Individual differences more closely relat j t pedagogy im lude 
professors' own cognitive styles and learning Uyi »s; their de/initicns of the 
teaching role, e. g., the relative importance giver o content :overage versus 
student outcomes; and the priorif'Js they asslgr. U '^^ ovj professional 
responsibilities, e. g., to teaching, scholarship, and Sc. /ice. 

Psychological variables which may yield interesting results include 
efficacy (Bandura, 1977), self-monitoring (Synder, 1985), anc causality 
orientation (Deci and Ryan, 1985). Researchers informed by the "theory of 
reasoned 2ctio.r (Ajzen, 1983) might investigate 1) the teachers" own 
attitudes toward the changed behavior and 2) teachers* beliefs that others 
who are important to them think they should change their tehavior 

Conclusion 

It is apparent that feedback from students, when augmented with 
consultation or other types of feedback, can powerfully influence subsequent 
teaching. But much remains to be learned about the extent or the impact of 
feedback, about the details and dynamics of the feedback process, and ? out 
the characteristics of those most receptive to feedback. 
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Table 1: Meta-Analyafs of Feedback Versus No Feedback Studies 



FEEDBACK IflPLEnENTATION 



DEPENDENT All inple- 
MEASURE aentotions Ratings alone 



Ratings with 
consultation 



Ratings with con- 
sultation augmented 
by other feedbacli 



Subsequent 
Ratings 



31 comparisons 23 comparisons 5 comparisons 3 comparisons 



tlA-.44 tL-.22 n. -1.10 tlA--996 

l-3.78»*» i«3.33»* i» 1.927 1=2.56 



67Ui iSile 



59th iSile 



86th iSile 



84th iSile 



AdiieveBent 



4 comparisons 2 comparisons 2 comparisons 



= .25 tL = .20 = .30 

-.61 lr-1.04 §E-.15 

1=.71 1=.20 1=2.00 



No comparisons 
located 



60th file 



58th %\]e 



62nd Xile 



Affect 



5 comparisons 2 comparisons 2 comparisons 1 comparison 



£L - .40 tl . - .24 
af = .25 S=.12 
1-3.19* 1-2.00 



- .42 

S = .34 
t- 1.25 



M - .672 



66th iSile 



60th 2i1e 



66th Xile 



75lh %\]e 



***p<.001 
•*p< 01 
* p < .05 
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